Missing data imputation, classification, prediction and average treatment effect estimation via Random Recursive Partitioning
نویسندگان
چکیده
In this paper we describe some applications of the Random Recursive Partitioning (RRP) method. This method generates a proximity matrix which can be used in non parametric hot-deck missing data imputation, classification, prediction, average treatment effect estimation and, more generally, in matching problems. RRP is a Monte Carlo procedure that randomly generates non-empty recursive partitions of the data and evaluates the proximity between observations as the empirical frequency they fall in the same cell of these random partitions over all the replications. RRP works also in the presence of missing data and is invariant under monotonic transformations of the data. No other formal properties of the method are known yet, therefore Monte Carlo experiments are provided in order to explore the performance of the method. A companion software is available in the form of a package for the R statistical environment.
منابع مشابه
Predicting Implantation Outcome of In Vitro Fertilization and Intracytoplasmic Sperm Injection Using Data Mining Techniques
Objective The main purpose of this article is to choose the best predictive model for IVF/ICSI classification and to calculate the probability of IVF/ICSI success for each couple using Artificial intelligence. Also, we aimed to find the most effective factors for prediction of ART success in infertile couples. MaterialsAndMethods In this cross-sectional study, the data of 486 patients are colle...
متن کاملMissing data imputation in multivariable time series data
Multivariate time series data are found in a variety of fields such as bioinformatics, biology, genetics, astronomy, geography and finance. Many time series datasets contain missing data. Multivariate time series missing data imputation is a challenging topic and needs to be carefully considered before learning or predicting time series. Frequent researches have been done on the use of diffe...
متن کاملتحلیل مشاهدات گمشده در مطالعه اثر دوزهای مختلف مکمل ویتامین D بر مقاومت به انسولین در دوران بارداری
Introduction: The aim of this study was to impute missing data and to compare the effect of different doses of vitamin D supplementation on insulin resistance during pregnancy. Methods: A clinical trial study was done on 104 women with diabetes and gestational age less than 12 weeks between 1391 and...
متن کاملارزیابی صحت پیشبینی ژنومی در معماریهای مختلف ژنومی صفات کمی و آستانهای با جانهی دادههای ژنومی شبیهسازیشده، توسط روش جنگل تصادفی
Genomic selection is a promising challenge for discovering genetic variants influencing quantitative and threshold traits for improving the genetic gain and accuracy of genomic prediction in animal breeding. Since a proportion of genotypes are generally uncalled, therefore, prediction of genomic accuracy requires imputation of missing genotypes. The objectives of this study were (1) to quantify...
متن کاملPartially linear varying coefficient models with missing at random responses
This paper considers partially linear varying coefficient models when the response variable is missing at random. The paper uses imputation techniques to develop an omnibus specification test. The test is based on a simple modification of a Cramer von Mises functional that overcomes the curse of dimensionality often associated with the standard Cramer von Mises functional. The paper also consid...
متن کامل